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Abstract 

Distributed search algorithms are crucial in dealing with large opti- 
mization problem, particularly when a centralized approach is not 
only impractical but infeasible. Many machine learning concepts 
have been applied to search algorithms in order to improve their 
effectiveness [4, 13, 19]. In this article we present an algorithm that 
blends Reinforcement Learning (RL) and hill climbing directly, by 
using the RL signal to guide the exploration step of a hill climbing 
algorithm. We apply this algorithm to the domain of a constel- 
lations of communication satellites where the goal is to minimize 
the loss of importance weighted data. We introduce the concept 
of “ghost” traffic, where correctly setting this traffic induces tho 
satellites to act to optimize the world utility. Our results indicated 
that the bi-utility search introduced in this paper outperforms both 
traditional hill climbing algorithms and distributed RL approaches 
such as team games. 


1 Introduction 

Many NASA projects under consideration involve constellations of data- 
communication relay satellites. In such a constellation, each satellite receives some 
amount of data at each time step (e.g., uplink from Earth or Mars, depending on 
where they are orbiting), and needs to relay this data back to an ultimate destina- 
tion (e.g., Earth) with minimal loss. Although each satellite may have a direct link 
to the ultimate destination at particular times, because of various limitations (e.g., 
storage, power, bandwidth), it may still be preferable to route the data across other 
satellites in the networks. Furthermore, the data is likely to have different levels 
of importance, and the routing algorithm needs to account for that possibility. For 
such problems, a suitable utility function to minimize is the total loss of importance 
weighted data across the network. 


in a single time step, respectively. We represent Earth as a "special” satellite .s 0 
with f'o = cc At each time step t , new data tj lJt of importance j is introduced to 
the system at satellite i. (this corresponds to the uploading of data from e.g. the 
surface of Mars). We sum the y tjt ' s over j and add this total to the total amount 
of data sent to satellite t from all other satellites to give the total influx ot data x t i 
at this time step. If this total is greater than the available storage capacity (given 
by c . _ rit where r lt is the amount of unsent data on the disk left over from the 
previous time step), then the difference between these two numbers is the amount of 
data lost at satellite i at t. We assume that the same proportion of data is dropped 
for each importance level, since once the disk is full the satellite is unable to ex- 
amine any data sent to it and determine its priority. Define l lJt to be the amount 
of data of importance j dropped at satellite i. Define the cost of dropping data of 
importance j as Then the objective function we wish to maximized is: 


E< E t E ; 


the importance- weighted percentage of data delivered to Earth (i.e., not dropped) 
by the system as a whole. 

The base routing algorithm has rough parallels with the shortest path style routing 
algorithms commonly used in internet routing (2, 3, 7]. Each satellite i evaluates 
a potential decision to send to satellite k by estimating the headroom of the 
optimal path to Earth beginning at k. The headroom represents the available room 
for additional data, given the available disk room on each satellite and the capacity 
of each link between them (the headroom replaces the traditional "delay” concept 
in data routing). Denote a path by a sequence of satellites p = s kl . • • • < s kp , where 
si i: represents the originating satellite and s* p is the satellite which ultimately sends 
the packet to Earth. Let the current amount of data being stored at satellite i be 
Vi. Then the headroom H(p) of a path p is given by: 


p - 1 

H(p) = min(min(6 s n 
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The presumption is that a path with high headroom should be favored over one 
with low headroom, since the likelihood of data being dropped is lower (just a path 
with low expected delays is favored over a path with high delays in traditional data 
routing). Note that in a real system, a particular satellite i would not have access 
to the precise storage amounts vj , j ^ i at the current instant in time. Hence, 
the headroom values would have to be estimated by satellites (in ways similar to 
how delays are estimated in traditional data routing). In these experiments, we 
supply each satellite with the vj' s. Because we are interested in how to improve 
the performance of the base algorithm, the estimation of the Vj s would introduce 
a systematic error that would not affect the ranking of the algorithms discussed 
below. 

It is straightforward to calculate the maximum headroom path from each satellite 
to Earth using a version of Dijsktra’s shortest path algorithm (3, 9]. One very 
simple strategy would be for each satellite to send all of its data to the first satellite 
along the maximum headroom path to Earth. This technique may not be optimal, 
however, if the amount of data the satellite sends is large relative to the headroom 
of the maximum headroom path. In particular, if the amount of data sent is larger 
than the headroom, then it is quite likely that some of that data will be dropped, 
unless more headroom happens to "clear up” fortuitously. It therefore seems wiser 
to split the data up and send it through more than one satellite. 



Because of the utility function to optimize traditional routing algorithms (o.g., 
shortest path algorithms [2, 3, 9|) are* ill-suited for this problem. As such we develop 
a baseline routing algorithm that addresses the needs of this utility function. We 
then introduce the concept of “ghost'’ traffic, which distorts how the data traffic 
appears to the satellites, and induces them to take actions that are beneficial to the 
system as a whole. The objective then reduces to setting this ghost traffic properly. 
Although one may use search algorithms such as simulated annealing [1, 5, 8] to set 
the ghost traffic, this problem naturally lends itself to a Reinforcement Learning 
(RL) [II, 16] approach. 

The use of Reinforcement Learning (RL) has proved successful in a multitude of 
optimization problems [4, 6, 12, 13, 14, 18, 19]. Furthermore, in a different context, 
distributed RL where many agents independently attempt to maximize a world 
utility (e.g, “team games") has been successfully used to solve large decentralized 
problems [6, 10, 13, 15]. In this article we discuss a combination of these two 
concepts into a “guided" search algorithm that uses distributed RL to improve 
upon both traditional team game solutions and traditional search algorithms. 

In this article we present a search algorithm that combines team games and hill 
climbing and apply it to minimizing loss of information on a constellation of com- 
munication satellites. In Section 2 we detail the problem domain, and introduce the 
concept of “ghost" traffic. Then, in Section ??, we present the bi-utility search al- 
gorithm, and discuss both its motivation and applicability. Finally, in Section 3 we 
present experimental results demonstrating the superiority of the bi-utility search 
algorithm over both team games, simulated annealing and hill climbing algorithms. 

2 Constellations of Satellites 

One of the key challenges in the design of constellations of communication satellite 
networks is the development of routing algorithms which minimize the amount of 
data lost by the system as a whole. In such constellations, each satellite’s stor- 
age capacity, downlink bandwidth and available power is certain be limited. If 
a satellite's disk becomes full, then any further incoming data will be lost. This 
predicament can potentially be avoided if the satellite clears storage space by send- 
ing some of its data to a neighboring satellite with more room on its disk and/or a 
larger bandwidth link to Earth. In general, in order for the entire constellation to 
minimize loss, data packets may need to be routed through several satellites before 
ultimately being delivered to Earth. Further complicating this task is the fact that 
different data packets may have different levels of importance. Routing decisions 
should reflect this variability in priorities. 

This task can be characterized by a well-specified globed objective function (min- 
imize importance- weighted amount of data dropped), and yet it is fundamentally 
decentralized in nature, since it is infeasible to disseminate routing decisions from a 
single, centralized source. Furthermore, some sort of adaptivity is clearly required, 
since the complexity and potential non-stationarity of the problem is certain to 
make any hand-designed scheme both brittle and undoubtedly sub-optimal. 

2.1 Model Description 

We model the evolution of the system as a sequence of discrete time steps. A con- 
stellation of satellites is specified by the following set of parameters: Each satellite 
s, has a storage capacity c, and a link capacity (bandwidth) 6** to each other satel- 
lite, where c t and 6,* are real numbers indicating the amount of data the satellite 
can store and the amount of data the satellite can transmit to each other satellite 


2.2 Baseline Routing Algorithm 


Let Hi be the headroom of the max headroom path from satellite 1 i to Earth. Let 
Hij be the headroom of the optimal path originating at satellite i, and with the first 
hop being to satellite j. So H tJ is given by 


H t j = min{bij ,'Hj) 

The satellite at which data originates does not decide on the full path to Earth 
taken by its packets; it simply decides on the first hop in the path and sends each 
chunk to the appropriate satellite based on the H tJ ' s. (similarly, in traditional data 
routing, a router only selects the first hop along the seemingly shortest path, based 
on the delays). The routing is performed as follows: Let v lk be the amount of data 
of importance fc currently at satellite i, and Define 

j* = argma XjHij. 

If Hij • > Vik i then all v lk is sent to satellite j * and H xj * is updated by subtracting 
v tk off of the headroom estimate to reflect the fact that that much data has already 
been sent to that satellite. If H XJ - < v ik , then an amount H tJ . is sent to j m and 
H t j • is updated to equal zero. 

The whole procedure is then repeated until either (1) v** = 0 or (2) H x j = OVj. 
If the second condition occurs before all data has been routed, then the remaining 
data is not sent anyw'here and instead kept on the disk until the next iteration in 
the hopes of routing it successfully thm. 

The routing algorithm factors in the importance levels of different data by perform- 
ing the routing for the highest importance data first, and then successively moving 
down to lower and lower importance levels until either all the data has been routed 
or all the headroom estimates are zero. 

2.3 Routing with “Ghost” Algorithm 

The routing scheme discussed above shares common features with the Shortest Path 
Algorithms (SPAs) for routing [3, 9], though it significantly outperforms them in 
this domain. This is not surprising as traditional shortest path algorithms do not 
handle data of varying importance well (unless quality of service considerations are 
included). Still, while it performs respectably on its own, it is susceptible to the 
same phenomena that hamper SPAs in traditional routing [17]: the satellites do not 
explicitly act to optimize G, and can therefore potentially work at cross-purposes. 

We now introduce the concept of “ghost” traffic to alleviate these concerns, and 
discuss several techniques for setting the levels of this traffic. Let us introduce 
distortions to the headroom estimates H tJ given by: 5 XJ . In other words, we set 


Hij = min(b l j y 'Hj) + S XJ 

and then perform the routing according to the perturbed headroom estimates. The 
<$' s are free parameters to be determined via an optimization algorithm, and be- 
cause their effects on the headroom is the same as that of actual data, we call them 
“ghost” traffic. Our goal then is to find a set of S ' s such that the performance of 
the system is substantially improved as compared to the performance of the base, 
deterministic routing strategy. 



If w»* collect all the <\j 's into a single* vector 6, it becomes clear that what we are 
faced with is a multidimensional optimization problem. Each particular choice of 6 
will yield a particular global performance G(rf). 


3 Experimental Results 

In this section we compare the performance of bi-utility search to both that of team 
games, and that of hill climbing in the problem of setting the '‘ghost’ 1 traffic level 
to to minimize the provided world utility. In this context we associate an “agent 
with each 6 (i.e., amount of ghost traffic on a particular link). In this section we 
present the results of the following 5 algorithms: 

• Baseline: Algorithm outlined in Section 2.2: No learning. 

• Team Games: Each agent uses RL to try to independently maximize 
world utility. 

• Random search: At each step, a random 6 vector is generated and that 
vector is probabilistically “accepted” based on the world utility it provides 
(simulated annealing). 

• Random Hill Climbing: Similar to random search, but the new delta 
vector is constrained to be one “bin” away from the old one. 

• G-Based Hill Climbing: Similar to random hill climbing, but the explo- 
ration step is guided by the RL estimates of the G rewards associated with 
each action under consideration. 




Figure 1: (a) Performance of the various algorithms in the Satellites Communication 
domain; (b) Exploration by Traditional and G-Guided Hill Climbing Algorithms. 

Figure 1 (a) compares the performance of the five algorithms described above on a 
problem with 20 satellites of moderate connectivity (150 <Ts), averaged over 100 runs 
(the resulting error bars are too small to depict). The use of a team game approach 
provides an improvement over the baseline algorithm. However, random search 
shows that the team game strategy fails to take full advantage of the possible gains. 
The hill-climbing results provide another significant jump over the gains achieved 
through simulated annealing. While both hill climbing algorithms are superior to 
team games, the G-guided search which incorporates components of both team 
games and hill climbing clearly outperforms traditional hill-climbing. 




Figure 1 (l>) provides an analysis on the reasons wliy G-guided exploration is su- 
perior to traditional hill climbing The plots show the states generated by the 
exploration steps of the two hill climbing algorithms. The exploration guided by 
G clearly provides better states, which directly translates into higher performance. 
Furthermore, the quality of the steps generated by the random hill climbing algo- 
rithm’s exploration show no improvement over time, [n contrast, the exploration 
steps generated by the G-guided hill climbing algorithm show a clear improvement. 


4 Conclusion 

Distributed search algorithms are crucial in dealing with large optimization problem, 
particularly when a centralized approach is not only impractical but infeasible. 
Team game solutions to this problem provide some relief, but often fall short of 
the potential gains (e.g.. due to a lack of coordination among agents). Local search 
algorithms provide another approach to this problem, but also fail to maximize 
the gains, in that they do not exploit any knowledge acquired from the previously 
explored states. 

In this article we present an algorithm that blends these two concepts into a global 
utility directed search algorithm. We apply this algorithm to a distributed informa- 
tion routing problem where the global objective is to minimize loss of importance- 
weighted data. By accurately setting the “ghost" traffic introduced to prod the 
agents to act to optimize world utility, this algorithm outperforms both traditional 
hill climbing algorithms and team game algorithms. 
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